DROPS

Document

From Big Data Theory to Big Data Practice (Dagstuhl Seminar 23071)

Authors: Martin Farach-Colton, Fabian Daniel Kuhn, Ronitt Rubinfeld, and Przemysław Uznański

Published in: Dagstuhl Reports, Volume 13, Issue 2 (2023)

Abstract

This report documents the program and the outcomes of Dagstuhl Seminar 23071 "From Big Data Theory to Big Data Practice". Some recent advances in the theory of algorithms for big data - sublinear/local algorithms, streaming algorithms and external memory algorithms - have translated into impressive improvements in practice, whereas others have remained stubbornly resistant to useful implementations. This seminar aimed to glean lessons for those aspect of these algorithms that have led to practical implementation to see if the lessons learned can both improve the implementations of other theoretical ideas and to help guide the next generation of theoretical advances.

Cite as

Martin Farach-Colton, Fabian Daniel Kuhn, Ronitt Rubinfeld, and Przemysław Uznański. From Big Data Theory to Big Data Practice (Dagstuhl Seminar 23071). In Dagstuhl Reports, Volume 13, Issue 2, pp. 33-46, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@Article{farachcolton_et_al:DagRep.13.2.33,
  author =	{Farach-Colton, Martin and Kuhn, Fabian Daniel and Rubinfeld, Ronitt and Uzna\'{n}ski, Przemys{\l}aw},
  title =	{{From Big Data Theory to Big Data Practice (Dagstuhl Seminar 23071)}},
  pages =	{33--46},
  journal =	{Dagstuhl Reports},
  ISSN =	{2192-5283},
  year =	{2023},
  volume =	{13},
  number =	{2},
  editor =	{Farach-Colton, Martin and Kuhn, Fabian Daniel and Rubinfeld, Ronitt and Uzna\'{n}ski, Przemys{\l}aw},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagRep.13.2.33},
  URN =		{urn:nbn:de:0030-drops-191809},
  doi =		{10.4230/DagRep.13.2.33},
  annote =	{Keywords: external memory, local algorithms, sublinear algorithms}
}

Document

DOI: 10.4230/LIPIcs.ESA.2022.76

Cardinality Estimation Using Gumbel Distribution

Authors: Aleksander Łukasiewicz and Przemysław Uznański

Published in: LIPIcs, Volume 244, 30th Annual European Symposium on Algorithms (ESA 2022)

Abstract

Cardinality estimation is the task of approximating the number of distinct elements in a large dataset with possibly repeating elements. LogLog and HyperLogLog (c.f. Durand and Flajolet [ESA 2003], Flajolet et al. [Discrete Math Theor. 2007]) are small space sketching schemes for cardinality estimation, which have both strong theoretical guarantees of performance and are highly effective in practice. This makes them a highly popular solution with many implementations in big-data systems (e.g. Algebird, Apache DataSketches, BigQuery, Presto and Redis). However, despite having simple and elegant formulation, both the analysis of LogLog and HyperLogLog are extremely involved - spanning over tens of pages of analytic combinatorics and complex function analysis. We propose a modification to both LogLog and HyperLogLog that replaces discrete geometric distribution with the continuous Gumbel distribution. This leads to a very short, simple and elementary analysis of estimation guarantees, and smoother behavior of the estimator.

Cite as

Aleksander Łukasiewicz and Przemysław Uznański. Cardinality Estimation Using Gumbel Distribution. In 30th Annual European Symposium on Algorithms (ESA 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 244, pp. 76:1-76:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{lukasiewicz_et_al:LIPIcs.ESA.2022.76,
  author =	{{\L}ukasiewicz, Aleksander and Uzna\'{n}ski, Przemys{\l}aw},
  title =	{{Cardinality Estimation Using Gumbel Distribution}},
  booktitle =	{30th Annual European Symposium on Algorithms (ESA 2022)},
  pages =	{76:1--76:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-247-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{244},
  editor =	{Chechik, Shiri and Navarro, Gonzalo and Rotenberg, Eva and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2022.76},
  URN =		{urn:nbn:de:0030-drops-170140},
  doi =		{10.4230/LIPIcs.ESA.2022.76},
  annote =	{Keywords: Streaming algorithms, Cardinality estimation, Sketching, Gumbel distribution}
}

Document

DOI: 10.4230/LIPIcs.CPM.2022.18

The Dynamic k-Mismatch Problem

Authors: Raphaël Clifford, Paweł Gawrychowski, Tomasz Kociumaka, Daniel P. Martin, and Przemysław Uznański

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

The text-to-pattern Hamming distances problem asks to compute the Hamming distances between a given pattern of length m and all length-m substrings of a given text of length n ≥ m. We focus on the well-studied k-mismatch version of the problem, where a distance needs to be returned only if it does not exceed a threshold k. Moreover, we assume n ≤ 2m (in general, one can partition the text into overlapping blocks). In this work, we develop data structures for the dynamic version of the k-mismatch problem supporting two operations: An update performs a single-letter substitution in the pattern or the text, whereas a query, given an index i, returns the Hamming distance between the pattern and the text substring starting at position i, or reports that the distance exceeds k. First, we describe a simple data structure with 𝒪̃(1) update time and 𝒪̃(k) query time. Through considerably more sophisticated techniques, we show that 𝒪̃(k) update time and 𝒪̃(1) query time is also achievable. These two solutions likely provide an essentially optimal trade-off for the dynamic k-mismatch problem with m^{Ω(1)} ≤ k ≤ √m: we prove that, in that case, conditioned on the 3SUM conjecture, one cannot simultaneously achieve k^{1-Ω(1)} time for all operations (updates and queries) after n^{𝒪(1)}-time initialization. For k ≥ √m, the same lower bound excludes achieving m^{1/2-Ω(1)} time per operation. This is known to be essentially tight for constant-sized alphabets: already Clifford et al. (STACS 2018) achieved 𝒪̃(√m) time per operation in that case, but their solution for large alphabets costs 𝒪̃(m^{3/4}) time per operation. We improve and extend the latter result by developing a trade-off algorithm that, given a parameter 1 ≤ x ≤ k, achieves update time 𝒪̃(m/k +√{mk/x}) and query time 𝒪̃(x). In particular, for k ≥ √m, an appropriate choice of x yields 𝒪̃(∛{mk}) time per operation, which is 𝒪̃(m^{2/3}) when only the trivial threshold k = m is provided.

Cite as

Raphaël Clifford, Paweł Gawrychowski, Tomasz Kociumaka, Daniel P. Martin, and Przemysław Uznański. The Dynamic k-Mismatch Problem. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 18:1-18:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{clifford_et_al:LIPIcs.CPM.2022.18,
  author =	{Clifford, Rapha\"{e}l and Gawrychowski, Pawe{\l} and Kociumaka, Tomasz and Martin, Daniel P. and Uzna\'{n}ski, Przemys{\l}aw},
  title =	{{The Dynamic k-Mismatch Problem}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{18:1--18:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.18},
  URN =		{urn:nbn:de:0030-drops-161454},
  doi =		{10.4230/LIPIcs.CPM.2022.18},
  annote =	{Keywords: Pattern matching, Hamming distance, dynamic algorithms}
}

Document

APPROX

DOI: 10.4230/LIPIcs.APPROX/RANDOM.2020.35

L_p Pattern Matching in a Stream

Authors: Tatiana Starikovskaya, Michal Svagerka, and Przemysław Uznański

Published in: LIPIcs, Volume 176, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020)

Abstract

We consider the problem of computing distance between a pattern of length n and all n-length subwords of a text in the streaming model. In the streaming setting, only the Hamming distance (L₀) has been studied. It is known that computing the exact Hamming distance between a pattern and a streaming text requires Ω(n) space (folklore). Therefore, to develop sublinear-space solutions, one must relax their requirements. One possibility to do so is to compute only the distances bounded by a threshold k, see [SODA'19, Clifford, Kociumaka, Porat] and references therein. The motivation for this variant of this problem is that we are interested in subwords of the text that are similar to the pattern, i.e. in subwords such that the distance between them and the pattern is relatively small. On the other hand, the main application of the streaming setting is processing large-scale data, such as biological data. Recent advances in hardware technology allow generating such data at a very high speed, but unfortunately, the produced data may contain about 10% of noise [Biol. Direct.'07, Klebanov and Yakovlev]. To analyse such data, it is not sufficient to consider small distances only. A possible workaround for this issue is the (1±ε)-approximation. This line of research was initiated in [ICALP'16, Clifford and Starikovskaya] who gave a (1±ε)-approximation algorithm with space 𝒪~(ε^{-5}√n). In this work, we show a suite of new streaming algorithms for computing the Hamming, L₁, L₂ and general L_p (0 < p < 2) distances between the pattern and the text. Our results significantly extend over the previous result in this setting. In particular, for the Hamming distance and for the L_p distance when 0 < p ≤ 1 we show a streaming algorithm that uses 𝒪~(ε^{-2}√n) space for polynomial-size alphabets.

Cite as

Tatiana Starikovskaya, Michal Svagerka, and Przemysław Uznański. L_p Pattern Matching in a Stream. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 176, pp. 35:1-35:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{starikovskaya_et_al:LIPIcs.APPROX/RANDOM.2020.35,
  author =	{Starikovskaya, Tatiana and Svagerka, Michal and Uzna\'{n}ski, Przemys{\l}aw},
  title =	{{L\underlinep Pattern Matching in a Stream}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020)},
  pages =	{35:1--35:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-164-1},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{176},
  editor =	{Byrka, Jaros{\l}aw and Meka, Raghu},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2020.35},
  URN =		{urn:nbn:de:0030-drops-126381},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2020.35},
  annote =	{Keywords: streaming algorithms, approximate pattern matching}
}

Document

APPROX

DOI: 10.4230/LIPIcs.APPROX/RANDOM.2020.46

Improved Circular k-Mismatch Sketches

Authors: Shay Golan, Tomasz Kociumaka, Tsvi Kopelowitz, Ely Porat, and Przemysław Uznański

Published in: LIPIcs, Volume 176, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020)

Abstract

The shift distance sh(S₁,S₂) between two strings S₁ and S₂ of the same length is defined as the minimum Hamming distance between S₁ and any rotation (cyclic shift) of S₂. We study the problem of sketching the shift distance, which is the following communication complexity problem: Strings S₁ and S₂ of length n are given to two identical players (encoders), who independently compute sketches (summaries) sk(S₁) and sk(S₂), respectively, so that upon receiving the two sketches, a third player (decoder) is able to compute (or approximate) sh(S₁,S₂) with high probability. This paper primarily focuses on the more general k-mismatch version of the problem, where the decoder is allowed to declare a failure if sh(S₁,S₂) > k, where k is a parameter known to all parties. Andoni et al. (STOC'13) introduced exact circular k-mismatch sketches of size Õ(k+D(n)), where D(n) is the number of divisors of n. Andoni et al. also showed that their sketch size is optimal in the class of linear homomorphic sketches. We circumvent this lower bound by designing a (non-linear) exact circular k-mismatch sketch of size Õ(k); this size matches communication-complexity lower bounds. We also design (1± ε)-approximate circular k-mismatch sketch of size Õ(min(ε^{-2}√k, ε^{-1.5}√n)), which improves upon an Õ(ε^{-2}√n)-size sketch of Crouch and McGregor (APPROX'11).

Cite as

Shay Golan, Tomasz Kociumaka, Tsvi Kopelowitz, Ely Porat, and Przemysław Uznański. Improved Circular k-Mismatch Sketches. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 176, pp. 46:1-46:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{golan_et_al:LIPIcs.APPROX/RANDOM.2020.46,
  author =	{Golan, Shay and Kociumaka, Tomasz and Kopelowitz, Tsvi and Porat, Ely and Uzna\'{n}ski, Przemys{\l}aw},
  title =	{{Improved Circular k-Mismatch Sketches}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020)},
  pages =	{46:1--46:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-164-1},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{176},
  editor =	{Byrka, Jaros{\l}aw and Meka, Raghu},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2020.46},
  URN =		{urn:nbn:de:0030-drops-126492},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2020.46},
  annote =	{Keywords: Hamming distance, k-mismatch, sketches, rotation, cyclic shift, communication complexity}
}

Document

DOI: 10.4230/LIPIcs.CPM.2020.29

Approximating Text-To-Pattern Distance via Dimensionality Reduction

Authors: Przemysław Uznański

Published in: LIPIcs, Volume 161, 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)

Abstract

Text-to-pattern distance is a fundamental problem in string matching, where given a pattern of length m and a text of length n, over an integer alphabet, we are asked to compute the distance between pattern and the text at every location. The distance function can be e.g. Hamming distance or 𝓁_p distance for some parameter p > 0. Almost all state-of-the-art exact and approximate algorithms developed in the past ∼ 40 years were using FFT as a black-box. In this work we present 𝒪~(n/ε²) time algorithms for (1±ε)-approximation of 𝓁₂ distances, and 𝒪~(n/ε³) algorithm for approximation of Hamming and 𝓁₁ distances, all without use of FFT. This is independent to the very recent development by Chan et al. [STOC 2020], where 𝒪(n/ε²) algorithm for Hamming distances not using FFT was presented - although their algorithm is much more "combinatorial", our techniques apply to other norms than Hamming.

Cite as

Przemysław Uznański. Approximating Text-To-Pattern Distance via Dimensionality Reduction. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 29:1-29:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{uznanski:LIPIcs.CPM.2020.29,
  author =	{Uzna\'{n}ski, Przemys{\l}aw},
  title =	{{Approximating Text-To-Pattern Distance via Dimensionality Reduction}},
  booktitle =	{31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)},
  pages =	{29:1--29:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-149-8},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{161},
  editor =	{G{\o}rtz, Inge Li and Weimann, Oren},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2020.29},
  URN =		{urn:nbn:de:0030-drops-121542},
  doi =		{10.4230/LIPIcs.CPM.2020.29},
  annote =	{Keywords: Approximate Pattern Matching, 𝓁₂ Distance, 𝓁₁ Distance, Hamming Distance, Approximation Algorithms, Combinatorial Algorithms}
}

Document

DOI: 10.4230/LIPIcs.MFCS.2019.66

RLE Edit Distance in Near Optimal Time

Authors: Raphaël Clifford, Paweł Gawrychowski, Tomasz Kociumaka, Daniel P. Martin, and Przemysław Uznański

Published in: LIPIcs, Volume 138, 44th International Symposium on Mathematical Foundations of Computer Science (MFCS 2019)

Abstract

We show that the edit distance between two run-length encoded strings of compressed lengths m and n respectively, can be computed in O(mn log(mn)) time. This improves the previous record by a factor of O(n/log(mn)). The running time of our algorithm is within subpolynomial factors of being optimal, subject to the standard SETH-hardness assumption. This effectively closes a line of algorithmic research first started in 1993.

Cite as

Raphaël Clifford, Paweł Gawrychowski, Tomasz Kociumaka, Daniel P. Martin, and Przemysław Uznański. RLE Edit Distance in Near Optimal Time. In 44th International Symposium on Mathematical Foundations of Computer Science (MFCS 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 138, pp. 66:1-66:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{clifford_et_al:LIPIcs.MFCS.2019.66,
  author =	{Clifford, Rapha\"{e}l and Gawrychowski, Pawe{\l} and Kociumaka, Tomasz and Martin, Daniel P. and Uzna\'{n}ski, Przemys{\l}aw},
  title =	{{RLE Edit Distance in Near Optimal Time}},
  booktitle =	{44th International Symposium on Mathematical Foundations of Computer Science (MFCS 2019)},
  pages =	{66:1--66:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-117-7},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{138},
  editor =	{Rossmanith, Peter and Heggernes, Pinar and Katoen, Joost-Pieter},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2019.66},
  URN =		{urn:nbn:de:0030-drops-110109},
  doi =		{10.4230/LIPIcs.MFCS.2019.66},
  annote =	{Keywords: String algorithms, Compression, Pattern matching, Run-length encoding}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2019.7

Faster Algorithms for All-Pairs Bounded Min-Cuts

Authors: Amir Abboud, Loukas Georgiadis, Giuseppe F. Italiano, Robert Krauthgamer, Nikos Parotsidis, Ohad Trabelsi, Przemysław Uznański, and Daniel Wolleb-Graf

Published in: LIPIcs, Volume 132, 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)

Abstract

The All-Pairs Min-Cut problem (aka All-Pairs Max-Flow) asks to compute a minimum s-t cut (or just its value) for all pairs of vertices s,t. We study this problem in directed graphs with unit edge/vertex capacities (corresponding to edge/vertex connectivity). Our focus is on the k-bounded case, where the algorithm has to find all pairs with min-cut value less than k, and report only those. The most basic case k=1 is the Transitive Closure (TC) problem, which can be solved in graphs with n vertices and m edges in time O(mn) combinatorially, and in time O(n^{omega}) where omega<2.38 is the matrix-multiplication exponent. These time bounds are conjectured to be optimal. We present new algorithms and conditional lower bounds that advance the frontier for larger k, as follows: - A randomized algorithm for vertex capacities that runs in time {O}((nk)^{omega}). This is only a factor k^omega away from the TC bound, and nearly matches it for all k=n^{o(1)}. - Two deterministic algorithms for edge capacities (which is more general) that work in DAGs and further reports a minimum cut for each pair. The first algorithm is combinatorial (does not involve matrix multiplication) and runs in time {O}(2^{{O}(k^2)}* mn). The second algorithm can be faster on dense DAGs and runs in time {O}((k log n)^{4^{k+o(k)}}* n^{omega}). Previously, Georgiadis et al. [ICALP 2017], could match the TC bound (up to n^{o(1)} factors) only when k=2, and now our two algorithms match it for all k=o(sqrt{log n}) and k=o(log log n). - The first super-cubic lower bound of n^{omega-1-o(1)} k^2 time under the 4-Clique conjecture, which holds even in the simplest case of DAGs with unit vertex capacities. It improves on the previous (SETH-based) lower bounds even in the unbounded setting k=n. For combinatorial algorithms, our reduction implies an n^{2-o(1)} k^2 conditional lower bound. Thus, we identify new settings where the complexity of the problem is (conditionally) higher than that of TC. Our three sets of results are obtained via different techniques. The first one adapts the network coding method of Cheung, Lau, and Leung [SICOMP 2013] to vertex-capacitated digraphs. The second set exploits new insights on the structure of latest cuts together with suitable algebraic tools. The lower bounds arise from a novel reduction of a different structure than the SETH-based constructions.

Cite as

Amir Abboud, Loukas Georgiadis, Giuseppe F. Italiano, Robert Krauthgamer, Nikos Parotsidis, Ohad Trabelsi, Przemysław Uznański, and Daniel Wolleb-Graf. Faster Algorithms for All-Pairs Bounded Min-Cuts. In 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 132, pp. 7:1-7:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{abboud_et_al:LIPIcs.ICALP.2019.7,
  author =	{Abboud, Amir and Georgiadis, Loukas and Italiano, Giuseppe F. and Krauthgamer, Robert and Parotsidis, Nikos and Trabelsi, Ohad and Uzna\'{n}ski, Przemys{\l}aw and Wolleb-Graf, Daniel},
  title =	{{Faster Algorithms for All-Pairs Bounded Min-Cuts}},
  booktitle =	{46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)},
  pages =	{7:1--7:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-109-2},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{132},
  editor =	{Baier, Christel and Chatzigiannakis, Ioannis and Flocchini, Paola and Leonardi, Stefano},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2019.7},
  URN =		{urn:nbn:de:0030-drops-105833},
  doi =		{10.4230/LIPIcs.ICALP.2019.7},
  annote =	{Keywords: All-pairs min-cut, k-reachability, network coding, Directed graphs, fine-grained complexity}
}

@InProceedings{abboud_et_al:LIPIcs.ICALP.2019.7,
  author =	{Abboud, Amir and Georgiadis, Loukas and Italiano, Giuseppe F. and Krauthgamer, Robert and Parotsidis, Nikos and Trabelsi, Ohad and Uzna\'{n}ski, Przemys{\l}aw and Wolleb-Graf, Daniel},
  title =	{{Faster Algorithms for All-Pairs Bounded Min-Cuts}},
  booktitle =	{46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)},
  pages =	{7:1--7:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-109-2},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{132},
  editor =	{Baier, Christel and Chatzigiannakis, Ioannis and Flocchini, Paola and Leonardi, Stefano},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2019.7},
  URN =		{urn:nbn:de:0030-drops-105833},
  doi =		{10.4230/LIPIcs.ICALP.2019.7},
  annote =	{Keywords: All-pairs min-cut, k-reachability, network coding, Directed graphs, fine-grained complexity}
}

Document

DOI: 10.4230/LIPIcs.CPM.2019.14

Hamming Distance Completeness

Authors: Karim Labib, Przemysław Uznański, and Daniel Wolleb-Graf

Published in: LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)

Abstract

We show, given a binary integer function diamond that is piecewise polynomial, that (+,diamond) vector products are equivalent under one-to-polylog reductions to the computation of the Hamming distance. Examples include the dominance and l_{2p+1} distances for constant p. Our results imply equivalence (up to polylog factors) between the complexity of computing All Pairs Hamming Distance, All Pairs l_{2p+1} Distance and Dominance Matrix Product, and equivalence between Hamming Distance Pattern Matching, l_{2p+1} Pattern Matching and Less-Than Pattern Matching. The resulting algorithms for l_{2p+1} Pattern Matching and All Pairs l_{2p+1}, for 2p+1 = 3,5,7,... are likely to be optimal, given lack of progress in improving upper bounds for Hamming distance in the past 30 years. While reductions between selected pairs of products were presented in the past, our work is the first to generalize them to a general class of functions, showing that a wide class of "intermediate" complexity problems are in fact equivalent.

Cite as

Karim Labib, Przemysław Uznański, and Daniel Wolleb-Graf. Hamming Distance Completeness. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 14:1-14:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{labib_et_al:LIPIcs.CPM.2019.14,
  author =	{Labib, Karim and Uzna\'{n}ski, Przemys{\l}aw and Wolleb-Graf, Daniel},
  title =	{{Hamming Distance Completeness}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{14:1--14:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.14},
  URN =		{urn:nbn:de:0030-drops-104853},
  doi =		{10.4230/LIPIcs.CPM.2019.14},
  annote =	{Keywords: fine grained complexity, approximate pattern matching, matrix products}
}

Document

DOI: 10.4230/LIPIcs.CPM.2019.15

Approximating Approximate Pattern Matching

Authors: Jan Studený and Przemysław Uznański

Published in: LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)

Abstract

Given a text T of length n and a pattern P of length m, the approximate pattern matching problem asks for computation of a particular distance function between P and every m-substring of T. We consider a (1 +/- epsilon) multiplicative approximation variant of this problem, for l_p distance function. In this paper, we describe two (1+epsilon)-approximate algorithms with a runtime of O~(n/epsilon) for all (constant) non-negative values of p. For constant p >= 1 we show a deterministic (1+epsilon)-approximation algorithm. Previously, such run time was known only for the case of l_1 distance, by Gawrychowski and Uznański [ICALP 2018] and only with a randomized algorithm. For constant 0 <= p <= 1 we show a randomized algorithm for the l_p, thereby providing a smooth tradeoff between algorithms of Kopelowitz and Porat [FOCS 2015, SOSA 2018] for Hamming distance (case of p=0) and of Gawrychowski and Uznański for l_1 distance.

Cite as

Jan Studený and Przemysław Uznański. Approximating Approximate Pattern Matching. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 15:1-15:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{studeny_et_al:LIPIcs.CPM.2019.15,
  author =	{Studen\'{y}, Jan and Uzna\'{n}ski, Przemys{\l}aw},
  title =	{{Approximating Approximate Pattern Matching}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{15:1--15:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.15},
  URN =		{urn:nbn:de:0030-drops-104865},
  doi =		{10.4230/LIPIcs.CPM.2019.15},
  annote =	{Keywords: Approximate Pattern Matching, l\underlinep Distance, l\underline1 Distance, Hamming Distance, Approximation Algorithms}
}

Search Results

Documents authored by Uznański, Przemysław

From Big Data Theory to Big Data Practice (Dagstuhl Seminar 23071)

Abstract

Cite as

Cardinality Estimation Using Gumbel Distribution

Abstract

Cite as

The Dynamic k-Mismatch Problem

Abstract

Cite as

L_p Pattern Matching in a Stream

Abstract

Cite as

Improved Circular k-Mismatch Sketches

Abstract

Cite as

Approximating Text-To-Pattern Distance via Dimensionality Reduction

Abstract

Cite as

RLE Edit Distance in Near Optimal Time

Abstract

Cite as

Faster Algorithms for All-Pairs Bounded Min-Cuts

Abstract

Cite as

Hamming Distance Completeness

Abstract

Cite as

Approximating Approximate Pattern Matching

Abstract

Cite as

Thanks for your feedback!

Could not send message